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Abstract. We present two algorithms that, given a prime t and an elliptic 
curve E/Fq, directly compute the polynomial &i(j(E), Y) £ [V] whose roots 
are the ^'-invariants of the elliptic curves that are £-isogenous to E. We do not 
assume that the modular polynomial &i(X, Y) is given. The algorithms may 
be adapted to handle other types of modular polynomials, and we consider 
applications to point counting and the computation of endomorphism rings. 
We demonstrate the practical efficiency of the algorithms by setting a new 
point-counting record, modulo a prime q with more than 5,000 decimal digits, 
and by evaluating a modular polynomial of level £ = 100,019. 



1. Introduction 

Isogenies play a crucial role in the theory and application of elliptic curves. 
A standard method for identifying (and computing) isogenies uses the classical 
modular polynomial $^ € Z[X, Y], which parameterizes pairs of ^-isogenous elliptic 
curves in terms of their j-invariants. More precisely, over a field F of characteristic 
not equal to I, the modular equation 

holds if and only if j\ and j2 are the j-invariants of elliptic curves defined over F 
that are related by a cyclic isogeny of degree I. In practical applications, F is 
typically a finite field ¥ q , and I is a prime, as we shall assume throughout. For the 
sake of simplicity we assume that q is prime, but this is not essential. 

A typical scenario is the following: we are given an elliptic curve E/¥ q and 
wish to determine whether E admits an ^-isogeny defined over ¥ q , and if so, to 
identify one or all of the curves that are £-isogenous to E. This can be achieved by 
computing the instantiated modular polynomial 

fa(Y) = $ i (j(E),Y)€¥ q [Y], 

and finding its roots in ¥ q (if any). Each root is the j-invariant of an elliptic curve 
that is ^-isogenous to E over ¥ q , and every such j-invariant is a root of 4>i(Y). 

For large I the main obstacle to obtaining fa is the size of which is 0(£ 3 log £) 
bits; several gigabytes for £ 10 3 , and many terabytes for £ w 10 4 , see [7, Table 1]. 
In practice, alternative modular polynomials that are smaller than $^ by a large 
constant factor are often used, but their size grows at the same rate and this quickly 
becomes the limiting factor, as noted in [10, §5.2] and elsewhere. The following 
quote is taken from the 2009 INRIA Project- Team TANC report [30, p. 9]: 

". . . computing modular polynomials remains the stumbling block for new point 
counting records. Clearly, to circumvent the memory problems, one would need an 
algorithm that directly obtains the polynomial specialized in one variable. " 
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Here we present just such an algorithm (two in fact). This is achieved by modi- 
fying the isogeny volcano approach of [7] in one of two ways. Under the generalized 
Riemann hypothesis (GRH), our first algorithm has an expected running time of 
0(£ 3 \og 3 £ llog£) and uses 0(£ 2 \og£ + £logq) space, assuming \ogq — 0{£\og£). 1 
This time complexity is the same as (and in practice faster than) the time to com- 
pute and the space complexity is reduced by up to a factor off. When logq w £, 
as in point-counting applications, the space complexity is nearly optimal, quasi- 
linear in the size of <f>£. The second algorithm uses 0{e 3 (logq + \og£)\og 1+ °We) 
time and 0(£\ogq + £ \og£) space, under the GRH. Its space complexity is optimal 
for q = Q{£), and when \ogq = 0{\og 2 ~ e £) its time complexity is better than the 
best known algorithms for computing However, for larger values of \ogq its 
running time becomes less attractive and the first algorithm may be preferred. 

Used in conjunction with the SEA algorithm, the first algorithm allows one to 
compute the cardinality of an elliptic curve modulo a prime q with a heuristic 2 
running time of (9(n 4 log 2 nllogn), using 0(n 2 logn) space, where n = logq. To our 
knowledge, all alternative approaches applicable to prime fields increase at least 
one of these bounds by a factor of n or more. The running time is competitive with 
SEA implementations that rely on precomputed modular polynomials (as can be 
found in Magma [3] and PARI [31]), and can easily handle much larger values of £. 

As an important practical optimization, we also evaluate modular polynomials 
4>{{Y) — &f(f(E), Y) defined by modular functions f(z) other than the j-function. 
This includes the Weber f-function, whose modular polynomials are smaller than 
the classical modular polynomial by a factor of 1728 and can be computed much 
more quickly (by roughly the same factor). This speedup also applies to (f>j. 

To demonstrate the capability of the new algorithms, we use a modified version 
of the SEA algorithm to count points on an elliptic curve modulo a prime of more 
than 5,000 decimal digits, and evaluate a modular polynomial of level £ = 100,019 
modulo a prime of more than 25,000 decimal digits. 

Acknowledgements. I am very grateful to David Harvey for his assistance with 
the algorithms for fast polynomial arithmetic used in the computations of §5. 

2. Background 

This section contains a brief summary of background material that can be found 
in standard references such as [20, 25, 26], or in the papers [7, 28]. For the sake of 
brevity, we recall only the results we need, and only in the generality necessary. 

For the sake of presentation, we assume throughout that F p and ¥ q denote prime 
fields with £ =/= p,q, and, where relevant, that q is sufficiently large (typically q > 2£). 
But this assumption is not needed for our main result; Algorithms 1 and 2 work 
correctly for any prime q (even q = £), and can be extended to handle non-prime q. 

2.1. Isogenics. Let E be an elliptic curve defined over a field F. Recall that an 
isogeny ip : E — > E is a morphism of elliptic curves that is also a group homomor- 
phism from E(¥) to E(¥). The kernel of a nonzero isogeny is a finite subgroup 
of E(¥), and when tp is separable, the size of its kernel is equal to its degree. Con- 
versely, every finite subgroup of E(¥) is the kernel of a separable isogeny. We say 

"^See Theorem 4 for a more precise bound. We write llogra for log logn throughout. 
2 The heuristic relates to the distribution of Elkies primes and is a standard assumption made 
when using the SEA algorithm (without it there is no advantage over Schoof's algorithm). 



ON THE EVALUATION OF MODULAR. POLYNOMIALS 



3 



that tp is cyclic if its kernel is cyclic, and call tp an ./V-isogeny when it has degree N. 
Note that an isogeny of prime degree £ 7^ char(F) is necessarily cyclic and separable. 

The classical modular polynomial $at is the minimal polynomial of the function 
j(Nz) over the field C(j), where j(z) is the modular j-function. As a polynomial 
in two variables, $^6 Z[X,Y] is symmetric in X and Y and has the defining 
property that the roots of $>e(j(E),Y) are precisely the j-invariants of the elliptic 
curves E that are related to E by a cyclic iV-isogeny. In this paper N = t is prime, 
in which case &e(X, Y) has degree I + 1 in each variable. 

If E is given by a short Weierstrass equation Y 2 = X 3 + a^X + a§, then tp can 
be expressed in the form tp(x,y) — (ipi(x), cy-^tp\{x)) for some c € F . When 
c = 1 we say that tp and its image are normalized. Given a finite subgroup G of 
E(¥), a normalized isogeny with G as its kernel can be constructed using Velu's 
formulae [32], along with an explicit equation for its image E. Conversely, suppose 
we are given a root j — j (E) of (pi (Y) = &e (j (E) 1 Y) , and also the values of <&x (j, I) , 
®y{j,3), ®xx{j,3), $xy{3,3), and $yy(j,j), where j = j(E) and 

®X = Sc^t' ®Y = W^' ^XX = S^®!!, ®XY = eTdY®^ ®YY = W72$l- 

To this data we may apply an algorithm of Elkies [9] that computes an equation 
for E that is the image of a normalized ^-isogeny tp: E — > E, along with an explicit 
description of its kernel: the monic polynomial hg(X) whose roots are the abcissae 
of the non- trivial points in ker?/;; see [14, Alg. 27]. The quantities &xx{j,J), 
$xy (j, j) ) and <&yy {j, I) are not strictly necessary; the equation for E depends 
only on j, j, $x{j,J) and &y{j, j), and we may then apply algorithms of Bostan et 
al. [4] to compute hg(X) (and an equation for ip) directly from E and E. 



2.2. Explicit CM theory. Recall that the endomorphism ring of an ordinary 
elliptic curve E over a finite field F p is isomorphic to an order O in an imaginary 
quadratic field K. In this situation E is said to have complex multiplication (CM) 
by O. The elliptic curve E/¥ p is the reduction of an elliptic curve E/C that also 
has CM by O. The j-invariant of E generates the ring class field Kq of O, and 
its minimal polynomial over K is the Hilbert class polynomial Hq € Z[X], whose 
degree is the class number h(0). The prime p splits completely in Kq, equivalcntly 
Ap = f 2 — v 2 disc(O) for some i,»eZ, and Hq mod p splits completely in F P [X]. 
We define the set 

Ello(F p ) = {j(E) e F p : End(£) ~ O}, 

which consists of the roots of Hq in F p . Let 1: O End(S) denote the canonical 
embedding. The ideals of O act on Ello(F p ) via isogenies as follows. Let a be an 
O-ideal of norm N, and define E[a] — n aeo ker i{a). Then there is a separable N- 
isogeny from E to E = E/E[a], and the action of a sends j{E) to j(E). Principal 
ideals act trivially, and this induces a regular action of the class group c\(0) on 
Ello(Fp). Thus EUo(Fp) is a principal homogeneous space, a torsor, for cl(O). 

Writing the cl(C)-action on the left, if o has prime norm £, then &e(j, [a] j) = for 
all j e Ello(Fp). Provided that I does not divide v = v(p), then (f>e(Y) = $ e (j,Y) 
has either one or two roots in ¥ p , depending on whether I ramifies or splits in K. In 
the latter case, the two roots [a]j and [a _1 ]j can be distinguished using the Elkies 
kernel polynomial he(X), as described in [5, §5] and [15, §3]. 
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2.3. Polycyclic presentations. In order to efficiently realize the action of cl(O) 
on Ello(Fp), it is essential to represent elements of cl(O) in terms of a set of 
generators with small norm. We will choose O so that c\{0) is generated by ideals 
of norm bounded by O(l), via [7, Thm. 3.3], but these generators will typically not 
be independent. Thus as explained in [28, §5.3], we use polycyclic presentations. 

Any sequence of generators ol = (a>i, . . . , at) for a finite abelian group G defines 
a polycyclic series 

1 = Go < Gi < . . . < Gk-i < G k = G, 
with Gi = (ai, . . . , Q-i), in which every quotient Gj/Gj_i ~ (a^) is necessarily cyclic. 
We associate to a the sequence of relative orders r(a) — (ri, . . . , r^) defined by 
r, = |Gj : Gj_i|. Every element /3 G G has a unique a- representation of the form 

p = a e = < ■■■a e k k (0<e t <n). 

We also associate to a the matrix of power relations s(a) = [s^] defined by 

a?=a' 1 i - 1 al i -'---a?: i 1 - 1 (0<s ij<rj ), 

with Sij = for i < j. 

We call a, together with r(a) and s(a), a (polycyclic) presentation for G. A 
generic algorithm to compute a polycyclic presentation is given in [28, Alg. 2.2]. 
Having constructed such an a, we may efficiently enumerate G = cl(O) (or the 
torsor Ello(Fq), given a starting point), by enumerating a-representations. 

2.4. Explicit CRT. Let pi, . . . ,p n be primes with product M, let Mi — M/pi, and 
let aiMi = 1 mod pi. If c e Z satisfies c = Cj mod pj, then c = ^ aaiMi mod M. 
If M > 2|c|, this congruence uniquely determines c. This is the usual CRT method. 

Now suppose M > 4|c| and let q be a prime (or any integer). Then we may 
apply the explicit CRT mod q [1, Thm. 3.1] to compute 

(1) C = °i a i M i ~ rM ) mod 9> 

i 

where r is the closest integer to X^i c j°j/-Pi! when computing r, it suffices to ap- 
proximate each CiOi/pi to within l/(4n), by [1, Thm. 2.2]. 

As described in [28, §6], we may use the explicit CRT to simultaneously compute 
c mod q for many integers c (the coefficients of <j>t, for example), using an online 
algorithm. We first precompute the a* and a^Mi mod q. Then, for each prime Pi, 
we determine the values c, for all the coefficients c (by computing mod pi), 
update two partial sums for each coefficient, one for ^2 CiaiMi mod q and one for 
*Yli c i a ilVii an d discard the c,'s. When the computations for all the pi have been 
completed (these may be performed in parallel), we compute r and apply (1) for 
each coefficient. The space required by the partial sums is just O(logg) bits per 
coefficient. See [28, §6] for further details, including algorithms for each step. 

2.5. Modular polynomials via isogeny volcanoes. For distinct primes I and p, 
we define the graph of l-isogenies r>(F p ), with vertex set F p and edges (ji,j2) 
present if and only if <f>t(ji,j2) — 0. Ignoring the connected components of and 
1728, the ordinary components of r^(F p ) are l-volcanoes [13, 19], a term we take 
to include cycles as a special case [28]. In this paper we focus on I- volcanoes of a 
particular form, for which we can compute modp very quickly, via [7, Alg. 2.1]. 

Let O be an order in an imaginary quadratic field K with maximal order Ojf, 
and let t be an odd prime not dividing [Ok ■ O]. Assume D — disc(O) < —4. 
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Suppose p is a prime of the form 4p = t 2 — £ 2 v 2 D with p = 1 mod £ and £ J( v; 
equivalently, p splits completely in the ray class field of conductor £ for O and does 
not split completely in the ring class field of the order with index £ 2 in O. Then 
the components of Ti(¥ p ) that intersect Elle>(F p ) are isomorphic £- volcanoes with 
two levels: the surface, whose vertices lie in ElloQFp), and the floor, whose vertices 
lie in Elle>/(F p ), where O' is the order of index £ in O. Each vertex on the surface 
is connected to 1 + (y) = 0, 1 or 2 siblings on the surface, and £ — (y) children on 
the floor. An example with £ — 7 and d = 2 is shown below: 



Provided h(0) > £ + 2, this set of ^-volcanoes contains enough information to 
completely determine mod p. This is the basis of the algorithm in [7, Alg. 2.1], 
which we adapt here. Selecting a sufficiently large set of such primes p allows one 
to compute over Z (via the CRT), or modulo an arbitrary prime q (via the 
explicit CRT). In order to achieve the best complexity bounds, it is important to 
choose both the order O and the primes p carefully. We thus introduce the following 
definitions, in which sq denotes the squarefree part of [Ok '■ 0\. 

Definition 1. A suitable family of orders assigns to each odd prime £ an 
imaginary quadratic order O such that 

(i) 4 < |disc(0A')l < c and sq < c and gcd(se>> 2 disc(O^)) = 1. 

(ii) I XpK'-O] and£ + 2< h{0) < c£ and £ 2 < disc(O) < c£ 2 , 
for some absolute constant c. 

This definition combines the criteria in [7, Def. 4.2] and [7, Thm. 5.1]. When 
we say that an order O is suitable for £, we mean that is has been assigned to £ 
by a suitable family of orders. An example of a suitable family of orders using 
discriminants of the form —7 • 3 2 ™ is given in [7, Ex. 4.3]. 

Definition 2. A prime p is suitable for £ and O if p = 1 mod £ and p satisfies 
4p = t 2 - £ 2 v 2 disc(C) for some t, v e Z with £ J( v and oj(v) < 2 log(log v + 3) . 

The function u>(v) counts the distinct prime divisors of v. The bound on u>(v) 
ensures that if O is suitable for £ then many small primes split in O and do not 
divide sq or v. Such primes allow us to more efficiently enumerate cl(O) and cl(O'). 

2.6. Selecting primes with the GRH. In order to apply the isogeny volcano 
method to compute <&£ mod q (or <pi mod q, as we shall do), we need a sufficiently 
large set S of suitable primes p. We deem S to be sufficiently large whenever 
^ pgS logp > B + log 4, where B is an upper bound on the logarithmic height of 
the integers whose reductions mod q we wish to compute with the explicit CRT. 
For $>e(X, Y) — J2i j aijX l Y 3 , we may bound h($i) = logmax^ j \a%j\ using 

(2) h{<5>?;) <Q£\og£+m, and h(®t) < 6£\og£ + 16£ + 14v^log£, 

as proved in [8] (we prefer the latter bound when £ > 3187). 3 

Heuristically (and in practice), it is easy to construct the set S. Given an order O 
of discriminant D suitable for £, we fix v — 2 if D = 1 mod 8 and v = 1 otherwise, 

^The bound 6£log£+ 17£ in [7, p. 1214] is a misprint (but conjecturally true). 
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and for increasing t = 2 mod £ of correct parity we test whether p = (t 2 — v 2 £ 2 D)/A 
is prime. We add each prime value of p to S, and stop when S is sufficiently large. 

Unfortunately, we cannot prove that this method will find any primes, even under 
the GRH. Instead, we use Algorithm 6.2 in [7], which picks an upper bound x 
and generates random integers t and v in suitable intervals to obtain candidate 
primes p = (t 2 — v 2 £ 2 D)/4: < x that are then tested for primality. The algorithm 
periodically increases x, so its expected running time is 0(B 1+e ), even without the 
GRH. To ensure that the bound on oj(v) in Definition 2 is satisfied, unsuitable w's 
are discarded; this occurs with negligible probability. 

Under the GRH, there are effective constants ci, c 2 > such that x > ci£ 6 \og 4 £ 
guarantees at least C2^ 3 log 3 £ suitable primes less than x, by [7, Thm. 4.4]. Asymp- 
totically, this is far more than the 0(£) primes we need to compute $^ mod q. Here 
we may consider larger values of B, and in general, x = 0(B 2 + ^ 6 log 4 ^) suffices. 
We note that S contains 0(B/ log B) primes (unconditionally), and under the GRH 
we have \ogp = 0(\og B + \og£) for all p £ S. 

3. Algorithms 

Let q be a prime and let E be an elliptic curve over ¥ q . A simple algorithm to 
compute 4>e{Y) = $ e (j(E), Y) £ ¥ q [Y] with the explicit CRT works as follows. Let j 
be the integer in [0,q — 1] corresponding to j(E) eF,~ Z/qZ. For a sufficiently 
large set S of suitable primes p, compute §t{X, Y) mod p using the isogeny volcano 
algorithm and evaluate &e(j, Y) mod p to obtain 4>e £ ¥ P [Y], and use the explicit 
CRT mod q to eventually obtain fa £ ¥ q [Y}. 

This naive algorithm suffers from two significant defects. The most serious is that 
the we may now require a much larger set S than is needed to compute <£>^ mod q. 
Compared to the coefficients of <fv, which have height h($e) — 0(£log£) bounded 
by (2), we now need to use the 0(£\og£ + £logq) bound 

(3) h{*tO, Y)) < h(* e ) + (£+!) log q + \og(£ + 2), 

since $>e(j, Y) involves powers of j up to f +1 . 

If log q is comparable to log^, then the difference between the bounds in (2) 
and (3) may be negligible. But when log q is comparable to £, using the bound in (3) 
increases the running time dramatically. This issue is addressed by Algorithm 1. 

The second defect of the naive algorithm is that although its space complexity 
may be significantly better than the 0(£ 2 log q) space required to compute mod q, 
it is still quasi-quadratic in I. But the size of (f>e is linear in £, so we might hope to 
do better, and indeed we can. This is achieved by Algorithm 2. 

In general, we cannot address both issues simultaneously, but when log q w £ (as 
in point-counting), Algorithm 1 is nearly optimal, and when logq = 0(log 2 £) (the 
case when computing endomorphism rings), Algorithm 2 is nearly optimal. 

3.1. Algorithm 1. The increase in the height bound from (2) to (3) is caused by 
the fact that we are exponentiating in the wrong ring. Rather than lifting j (E) £ ¥ q 
to the integer j and computing powers of its reduction in ¥ p (which simulates 
powering in Z), we should instead compute powers j(E),j(E) 2 , . . .j(E) l+1 in ¥ q , 
lift these values to integers Xi, X2, . . . , x^+i, and work with their reductions in F p . 
Of course the reductions of xi, X2, . . . , x^+i need not correspond to powers of any 
particular element in F p ; neverthless, if we simply replace each occurrence of X' 1 in 
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the modular polynomial <&e(X, Y) modp with x% mod p, we achieve the same end 
result using a much smaller height bound. 

We now present Algorithm 1 to compute 4>{Y) — (f>i(Y) = $t(j(E), y). If desired, 
the algorithm can also compute the polynomials (f>x(Y) = (d$>e/dX)(j(E), Y) and 
<j>xx(Y) — (d 2 ^i/dX 2 )(j(E) 1 Y) 1 which may be used to compute normalized iso- 
genics, as described in §3.7 below. These optional steps are shown in parentheses. 

Algorithm 1 

Input: An odd prime £, a prime q, and j(E) e ¥ q . 

Output: The polynomial 4>(Y) = $ e (j(E),Y) e ¥ q [Y] (and (f>x(Y) and 4>xx(Y)). 

1. Select an order O suitable for £ and a set of suitable primes 5* (see §2.6), 
using the height bound B = U\og£ + IU + \ogq + 3\og(£ + 2). 

2. Compute the Hilbert class polynomial Hq(X) via [28, Alg. 2]. 

3. Perform CRT precomputation mod q using S (see §2.4). 

4. Compute integers Xi e [0, q — 1] such that Xi = j(E) 1 mod q, for < i < I + 1. 

5. For each prime p G S: 

a. Compute $^(A, F) modp using Ho, via [7, Alg. 2.1]. 

b. Compute 4>(Y) = a ijXi Y^ modp, where $ t (X,Y) = Eij <',,.\')' 1 - 

c. (Compute 4>x0^) — io-ij^iY^ modp 

and also 4>xx(Y) = J2ij «(* - l)a i j^ i F- 3 modp). 

d. Update CRT sums for each coefficient a of (f> (and of (f>x and <f>xx)- 

6. Perform CRT postcomputation to obtain </> (and 4>x and 4>xx) mod q. 

7. Output 4> (and <j)x and 4>xx)- 

Proposition 3. The ouput <fr(Y) of Algorithm 1 is equal to $e(j(E),Y). 
(and cj> x (Y) = (d$t/dX)(j(E),Y) and <p xx (Y) - (d 2 $i/ dX 2 )(j(E), Y)). 

Proof. Let </>* = $e{j,Y) G V q [Y]. Let i, £ Z be as in step 4. Write $ £ as 
£ij a t3 X l Y^ with ffi!J e Z and let <j) = a^xtfi e Z[Y]. Then 0* = $ mod g, 
and 4> = <f) mod p. To prove <f) = <f)* , we only need to show h((f>) < B. We have 

|Z!i a ii^| < (^ + 2)qcxp/i($ £ ), 
for < j < £ + 1, hence h(<f>) < B. The proofs for (f>x and (f>xx are analogous. □ 

Theorem 4. Assume the GRH. The expected running time of Algorithm 1 is 
0(i 2 B log 2 Slog log B), where B = 0(1 \ogl + log q) is as specified in step 1. It 
uses 0(1 log q + £ 2 log B) space. 

Proof. We use M (n) = 0(n log n Hog n) to denote the cost of multiplication [22] . For 
step 1 we use the family of orders O with discriminants of the form D = — 7 ■ 3 2 ™, 
as in [7]. The expected time for step 2 is 0(B 1+e ) using 0(B) space; see §2.6. 
Step 3 uses 0(£ 2+e ) expected time and 0(£(\og£ + logq)) space, by [28, Thm. 1], 
since h(D) — 0(£). An analysis as in [28, §6.3] shows that the total cost of all 
CRT operations is 0(£M(B) log B) time and 0(£\ogq) space. Step 5 clearly uses 
0(£M(\ogq)) time and 0(£\ogq) space. 

The set S contains 0(Bj log B) primes p, and under the GRH, logp = 0(log B); 
see §2.6. The cost per p is dominated by step 6a, which takes 0(£ 2 log Bllogi?) 
expected time and 0(£ 2 log B) space, by [7]. This yields an 0(£ 2 B log 2 B log log B) 
bound for step 6, which dominates, and the total space is 0(£ log q + £ 2 log B) . □ 
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3.2. Algorithm 2. We now present Algorithm 2, which for q > £ has optimal space 
complexity 0(£ log q). When q is reasonably small, say logq = o(\og 2 £), Algorithm 2 
is also faster than Algorithm 1, but when log q is large it may be much slower, since 
it uses the same height bound (3) as the naive approach. The computation of 
4> £ F p [y] is more intricate, so we present it separately as Algorithm 2.1. Unlike 
Algorithm 1, it is not so easy to also compute <j)x and <pxx, but an alternative 
method to compute normalized isognies using Algorithm 2 is given in §3.7. 
Algorithm 2 

Input: An odd prime £, a prime q, and j(E) £ ¥ q . 
Output: The polynomial 4>{Y) = $ e (j(E),Y) £ ¥ q [Y]. 

1. Select an order O suitable for £ and a suitable set of primes S (see §2.6), 
using the height bound B = U\og£ + 181 + (£ + 1) logq + log{£ + 2). 

2. Compute the Hilbert class polynomial Hq via [28, Alg. 2]. 

3. Perform precomputation for the explicit CRT mod q using S. 

4. Let j be the integer in [0, q — 1] congruent to j{E) mod q. 

5. For each prime p £ S: 

a. Compute 0(F) = ^e(j,Y) modp using O and Hq via Algorithm 2.1. 

b. Update CRT sums for each coefficient a of (f>. 

6. Perform postcomputation for the explicit CRT to obtain <fr € ^M-^]- 

7. Output cj). 

Proposition 5. The ouput <p(Y) of Algorithm 1 is equal to $e(j(E),Y). 
Proof. This follows immediately from Proposition 8 below and the bound 



h($ e (j, Y)) - logmax. lEi <Hj?\ < log(£ + 2) + {i + 1) logq + h(* t ) < B. 



Theorem 6. Assume the GRH and that Xogq = 0(£ k ) for some constant k. The 
expected running time of Algorithm 2 is 0(£ 3 (\ogq + log I?) logfllog^lllog 2 ^) and it 
uses 0(£ log q + £ log £) space. 

Proof. As in the proof of Theorem 4, the expected running time is dominated by 
the time to compute 4>(Y), which by Proposition 9 is 0(£ 2 log 2 p Hog p) . There are 
0(B/\ogB) primespG 5, and under the GRH we have logp = 0(log B) =0(\og£). 
The space complexity is dominated by the 0{B) = 0{£ log £ + £ log q) size of S. □ 



3.3. Algorithm 2.1. The algorithm in [7, Alg. 2.1] computes $^ mod p by enu- 
merating the sets E11 (F P ) and E11 ,(F P ), where O' = Z + £0, the latter of which 
contains approximately £ 2 elements. To achieve a space complexity that is quasi- 
linear in £, we cannot afford to store the entire set Ello'(F p ). We must compute 
Y) mod p using an online algorithm, processing each ji~ G Eile>/(F P ) as we 
enumerate it, and then discarding it. Let us consider how this may be done. 

Let j/i, ... , XfhUD) be the elements of EUe>(F p ), as enumerated using a polycyclic 
presentation a for cl(O). Each yi is £-isogenous to a set Ni of siblings in E11q(F p ), 
and to a set Cj of children in Ello/(F p ); see §2.5. Thus we have 



The siblings can be readily identified in our enumeration of Elle>(F p ) using the CM 
action (see §2.2). To identify the children, we need to be able to determine, for any 



on the height of $ £ (j, Y) £ Z[Y}. 



□ 



MX,Vi)= (ft (X -3))(H(X -J)). 
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given j e O', the set Ci in which it lies. Each d is a subset of the torsor Elle>/(F P ) 
corresponding to a coset of the subgroup C C cl(O') generated by the ideals of 
norm £ 2 ; indeed, two children have the same parent if and only if they are related 
by an isogeny of degree £ 2 (the composition of two I- isogenics) . 

The group cl(O') acts on the cosets of C, and we need to compute this ac- 
tion explicitly in terms of the polycyclic presentation (3 used to enumerate cl(O'). 
This problem is addressed by a generic group algorithm in the next section that 
computes a polycyclic presentation 7 for the quotient cl(C)/C, along with the 
7-representation of the image of each generator in (3. 

As we enumerate the elements jk of Elle>/(F p ), starting from a child j\ of y\ 
obtained via Velu's algorithm, we keep track of the element Sk € cl(O') whose 
action sends j\ to jk- The image of Sk in cl(0')/C is the coset of C corresponding 
to the set d containing jk, and we simply identify d with the ith clement of 
cl(0')/C as enumerated by 7 (in the lexicographic ordering of 7-representations) . 

Thus we can compute the polynomials (j>%(X) = $e(X,yi) as we enumerate 
Elle>/(F p ) by accumulating a partial product of linear factors for each But 
since our goal is to evaluate Zi — <pi (j) mod p, we simply substitute x = j mod p 
into each linear factor, as we compute it, and accumulate the partial product in Z{. 

Having computed the values Zi for 1 < i < I + 2, we interpolate the unique 
polynomial <fi(Y) of degree at most £ + 1 for which (j>(yi) — Zi, using Lagrange 
interpolation. This polynomial must be &e(j, Y). We now give the algorithm. 

Algorithm 2.1 

Input: An odd prime £, a suitable order O, a suitable prime p, and x € F p . 
Output: The polynomial (f>(Y) = <$> e (x, Y) e ¥ P [Y]. 

1. Compute presentations a. of cl(O) and (3 of cl(O') suitable for p (see below). 

2. Represent generators of the subgroup C C cl(O') defined above in terms of (3. 

3. Compute the presentation 7 of c\{0')/C derived from (3, via Algorithm 3. 

4. Find a root x\ of Hq mod p (compute Hq mod p if needed). 

5. Enumerate Elle>(F 9 ) as w\,W2, ■ ■ ■ ,Wh(o) using a. 

6. Obtain j\ € Ello'(F g ) from w\ using Velu's algorithm. 

7. Set Zi<-1 and yi <- null for 1 < i < £ + 2. 

8. For each jk — 5k{ji) in Ello'(F 9 ) enumerated using (3: 

a. Compute the index i of [5k] in the 7-enumeration of cl(C)/C. 

If i > I + 2 then proceed to the next jk, skipping steps b and c below. 

b. If yi — null then set yi to the f-parent of jk (via Velu's algorithm) and for 
each ^-sibling j of yi in Ello(F p ) set Zi Zi(x — j). 

c. Set Zi <- Zi(x - j k ). 

9. Interpolate <j> G F p [V] such that deg </> < £ + 1 and <ft(yi) = Zi for 1 < i < £ + 2. 
10. Output <j). 

The value null assigned to yt in step 7 is used to indicate that the value of yi is 
not yet known. Each yi is eventually set to a distinct Wj € Ello(F p ). 

Remark 7. In practical implementations, Algorithm 2 selects the primes p € S so 
that the presentations a, (3, and 7 are the same for every p and and precomputes 
them (the only reason they might not be the same is to avoid prime ideals whose 
norm divides v = v(p), but in practice we fix v < 2, as discussed in §2.6). 

Proposition 8. Algorithm 2.1 outputs 4>(Y) — $>e(x,Y) mod p. 
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Proof. The values yi € Elle>(F p ) are necessarily distinct. It follows from the dis- 
cussion above that Algorithm 2.1 computes = 3>e(x, yi) for 1 < i < £ + 2. Thus 
<t>*(Y) = $e(x,Y) satisfies (j>*{yi) — Zi for 1 < i < £ + 2, as does (j>(Y), and both 
polynomials have degree at most £ + 1. Therefore <j> — 4>* . □ 

Theorem 9. Assume the GRH. Algorithm 2.1 runs in 0(£ 2 n 2 log 2 nllog 2 n) expected 
time using 0(£n) space, where n — \ogp. 

Proof. The time complexity is dominated by step 8, which enumerates the 0{£ 2 ) 
elements of Ello/(F p ) using [3. By [7, Thm. 5.1] and the suitability of O and p, we 
may assume each /?, = [hi], where bi has prime norm &j = O(lognllogn). Using 
Kronecker substitution and probabilistic root-finding [33] , the expected time to find 
the (at most 2) roots of ^b i (jk,Y) is 0(nM(nlognllogn)), which dominates the 
cost for each jk ■ Applying M (n) = 0(n log n Hog n) and multiplying by £ 2 yields the 
desired time bound. Taking into account h(0) = 0(£) and p > I, the computation 
of Ho mod p uses 0(£n) space, by [28, Thm. 1] , and this bounds the total space. □ 

3.4. Computing a polycyclic presentation for a quotient group. We now 

give a generic algorithm to derive a polycyclic presentation 7 for a quotient of finite 
abelian groups G/H. This presentation can be be used to efficiently compute in 
G/H, and to compute the image of elements of G, as required by Algorithm 2.1. 

Algorithm 3 

Input: A minimal polycyclic presentation (3 = (/3i, . . . , /3k) for a finite abelian 
group G and a subgroup H — (a\, . . . , a t ), with each on specified in terms of (3. 
Output: A polycyclic presentation 7 for G/H, with ji = [Pi] for each e (3. 

1. Derive a polycyclic presentation a. for H from a\, . . . , a t via [28, Alg. 2.2]. 

2. Enumerate H using a. and create a lookup table Tjj to test membership in H. 

3. Derive a polycyclic presentation 7 for G/H from \J3\], [j3k] via [28, Alg. 2.2], 
using Th as described below. 

4. Output 7, with relative orders r(-y) and relations s(-y). 

The polycyclic presentation 7 output by Algorithm 3 is not necessarily minimal. 
It can be converted to a minimal presentation by simply removing those 7j with 
r{"fi) = 1, however, for the purpose of computing the image in G/H of elements 
of G represented in terms of (3, it is better not to do so. 

The algorithm in [28, Alg. 2.2] requires a TableLookup function that searches 
for a given group element in a table of distinct group elements. In Algorithm 3 
above, the elements of G are uniquely represented by their /3-representations, but 
elements oiG/H are represented as equivalence classes [5], with S <G G, which is not 
a unique representation. To implement the TableLookup function for G/H, we do 
the following: given [So] € G/H and a table Tq/h of distinct elements [Si] in G/H, 
we test whether <5o^ rl *= ^> f° r eacn *= T. With a suitable implementation 
of Th (such as a hash table or balanced tree), membership in H can be tested in 
0(log |G|) time, which is dominated by the 0(log 2 |G|) time to compute 5q5~[ . 

The problem of uniquely representing elements of G/H is solved once Algo- 
rithm 3 completes: every element of G/H has a unique 7- representation. 

Theorem 10. Algorithm 3 runs in 0(nlog 2 n) time using 0((m+n/m) logn) space, 
where n = \G\ and m = \H\. 
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Proof. The time complexity is dominated by the n/m calls to the TableLookup 
function performed by [28, Alg. 2.2] in step 3, each of which performs m opera- 
tions in G (using /3-representations) and m lookups in Th, yielding a total cots of 
0(nlog 2 n). The space bound is the size of Th plus the size of T G / H . □ 

3.5. Other modular functions. For a modular function g of level N and a prime 
I I N, the modular polynomial $f, is the minimal polynomial of the function g(£z) 
over the field C(g). For suitable functions g, the isogeny volcano algorithm for 
computing $e(X, Y) can be adapted to compute ^(X,Y), as described in [7, §7]. 
There are some restrictions: $| must have degree I + 1 in both X and Y, and we 
require some additional constraints on the suitable orders O that we use. Specifi- 
cally, we require that there is a generator r of O for which g(r) lies in the ring class 
field Kq. In this case we say that g{r) is a class invariant, and let H^{X) denote 
its minimal polynomial over K; see [6, 11, 12] for algorithms to compute Hq(X). 
We also require the polynomial H g Q to be defined over Z. 

With this setup, there is then a one-to-one correspondence between the roots 
j(t) of H and the roots g(r) of H 9 Q in which ^ 3 (g(r), j(r)) = 0, where * 9 
is the minimal polynomial of g as an element of C(j); note that VP 9 does not 
depend on I and is assumed to be given. The class group cl(O) ~ G&1(K o/K) acts 
compatibly on both sets of roots, and this allows us to compute $ 9 modulo suitable 
primes p using essentially the same algorithm that is used to compute <S>i mod p. 
In particular, we can enumerate the set Ell 9 9 (F p ) = {x e F p : Hq(x) = 0} using 
a polycyclic presentation a for cl(O), provided that we exclude from a generators 
whose norm divides the level of g, and similarly for Ell^, (F p ), where O' = Z + lO. 

Thus Algorithms 1 and 2 can both be adapted to compute instantiated modular 
polynomials 4> 9 (Y) = $ 9 (x,y) mod q. Some effort may be required to determine 
the correspondence between Ello(F p ) and Ell 9 3 (F p ) in cases where ^ 9 (X 7 j{E)) or 
^ 9 (g(E), Y) has multiple roots in F p ; this issue arises when we need to compute a 
child or parent using Velu's algorithm. There are several techniques for resolving 
such ambiguities, see [7, §7.3] and especially [12], which explores this issue in detail. 

We emphasize that the point x at which we are evaluating $| (a;, Y) may be any 
element of F 9 , it need not correspond to the u g- invariant" of an elliptic curve. 4 This 
permits a very useful optimization that speeds up our original version of Algorithm 1 
for computing (/>e(Y) = 4> 3 e (Y) by a factor of at least 9, as we now explain. 

3.6. Accelerating the computation of 4>t(Y) using 72. Let 72(2) be the unique 
cube- root of j(z) with integral Fourier expansion, a modular function of level 3 that 
yields class invariants for O whenever 3 / disc(O). As noted in [7, §7.2], for I > 3 
the modular polynomial can be written as 

(4) ^J 2 (X,Y) = R(X 3 ,Y 3 )Y e + S(X 3 ,Y 3 )XY + T(X 3 ,Y 3 )X 2 Y 2 - e , 
with e = I + 1 mod 3 and R,S,Te Z[X, Y]. We then have the identity 

(5) <S> £ = R 3 Y e + (S 3 - 3RST)XY + TX 2 Y 2 ~ e . 

When computing <i>J 2 mod p with the isogeny volcano algorithm, one can exploit (4) 
to speed up the computation by at least a factor of 3. In addition, the integer 



Every x € ¥ q is j(E) for some E/F q , and when E is ordinary, j(E) is the reduction of some 
j(r) = j(E) with Z[r] = O ~ End(_B). But g(r) might not be a class invariant for this O. 
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coefficients of $J 2 are afso smaller than those of $^ by roughly a factor of 3; we 
may use the height bound h{<£] 2 ) < 2£\og£ + 8£ from [7, Eq. 18]. 

Let us consider how we may modify Algorithm 1 to exploit (5), thereby accel- 
erating the computation of <f>e(Y) — &t(x, Y) mod q, where x = j(E) e ¥ q . Let 
r(Y) — R(x, Y) mod q, and similarly define s and t in terms of S and T. Rather 
than computing $^ mod p in step 5a, we compute $J 2 mod p and derive R, S, and T 
from (4). We then compute polynomials f, s, and t mod p instead of (j> in step 5b. 
Finally, we recover r, s, and t mod q in step 6 via the explicit CRT and output 

(6) <f> = r 3 Y e + x{s 3 - 3rst)Y + x 2 t 3 Y 2 - £ 

in step 7. Adjusting the height bound B appropriately, this yields a speedup of 
nearly a factor of 9. Note that we are not assuming x = j(E) has a cube-root in ¥ q 
or that End(S) ~ O satisfies 3 / disc(O), the identity (6) holds for all x. 

We can similarly compute <j>x and 4>xx- To simplify the formulas, let U = 
(S 3 - 3RST), and define u = U(x,Y) mod q. Define r(Y) = (-^R)(x,Y) and 

r(Y) = (-^ x -2R)(x,Y), and similarly for s,t, and u. Note that u, u, and u can all 
be easily expressed in terms of r, f, r, s, s, s, t, i, and i. We replace the computation 
of cf>x and 4>xx hi step 5c with analogous computations of f, f, s, s, t, and t mod p. 
We then obtain r, f, r, s, s, s, i, i, and t via the explicit CRT mod q and apply 

(7) (j) X = 3r 2 rY e + {u + xu)Y + (2xt 3 + 3x 2 t 2 i)Y 2 - e ; 

(j) XX = (6rf 2 + 3r 2 r)Y e + {2u + u)Y + (2t 3 + \2xt 2 i + Qx 2 ti 2 + 3x 2 t 2 i)Y 2 - e . 

3.7. Normalized isogenies. We now explain how Algorithms 1 and 2 may be 
used to compute normalized isogenies ip, first using j-invariants, and then using 
g-invariants. Throughout this section j = j(E) e ¥ q denotes the j-invariant of a 
given elliptic curve E/¥ q , defined by y 2 = x 3 + Ax + B, and 4>{Y) = §e(j, Y). We 
use j = j(E) to denote a root of 4>{Y) in ¥ q . Our goal is to compute an equation 
for the image of tp: E — >■ E, and the kernel polynomial hg(X) for tp. 

3.7.1. Algorithm 1. When computing <f>, we also compute the optional outputs 4>x 
and fax, and then <fr(Y) = ^(f>{Y), <Pyy(Y) = ^y(r), and cp XY = ^4>x(Y). 
We then compute the quantities J) = <j>*(j), for * = X, Y, XX, XY, YY, as 
defined in §2.1, and apply Elkies algorithm [14, Alg. 27] to compute E and h t {X). 

3.7.2. Algorithm 2. Having computed <p and obtained j, we run Algorithm 2 again, 
this time with the input j, obtaining <j)(Y) = $> e (j,Y), which we now regard as 
4>(X) = $t(X,j), by the symmetry of $£. We then compute $x(j,j) = (^Df <A)(j) 
and $y(j,j) = (jf</>)(J)i and the quantities 

8) 3 = — 7-3, 3 = / ■ ~\ J , m = -, k= — — — = , 

A t<f>Y{3,3) 3 1728 -j 

as in [14, Alg. 27]. The normalized equation for E is then y 2 = x 3 + ^-j^-x + e6 ^ k , 
and the fastElkies' algorithm in [4] may be used to compute he(X). 

3.7.3. Handling g-invariants. We assume that g{E) is known to be a class invari- 
ant (see §3.8 below). Let g = g(E), (j> 9 (Y) = $ 9 e (g,Y), and let g = g(E) de- 
note a root of 4> 9 {Y) in ¥ q . In the case of Algorithm 1 we compute $ 9 x (g,g) — 
(j) 9 x {g) and Qyid'd) ~ i'W < f )9 )(9), an d in the case of Algorithm 2 we make a sec- 
ond call with input g to obtain <f> 9 (X) = $> 9 (X,g) as above. We then compute 
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®x(9>9) = anc * ^(fliff) — (^^XfiO- We assume the modular equa- 

tion (G, J) = relating 5(2) to j(z) can be solved for j(z) (for the g(z) considered 
in [7], degj * 9 (G, J) < 2), and let F(G) satisfy *f (F(J), J) = and F' = ^F. 
To compute the normalized equation for E, we proceed as in (8), except now 

(9) 1 = -($ x (g,~9)F'(~g))/(e$Y(g,~g)/F'(g))j'. 

The fastElkies' algorithm in [4] may then be used to compute he, or, in the case of 
Algorithm 1, one may derive the trace of he using & xx (g, g), $ 9 XY (g, g), <& Y y (9' 9) 
as in §3.7.1, and compute he as usual. We omit the details. 

3.8. Verifying that g{E) is a class invariant. Let E/¥ q be an elliptic curve that 
is not supersingular (see [29] for fast tests), with End(F) ~ O. As in §3.5, we call 
an element g(E) of ¥ q a class invariant if (i) the ring class field of O is the splitting 
field of H a {X), and (ii) g{E) is a common root of H 9 {X) and W{X, j{E)). 

For practical applications, we would like to determine whether g{E) is a class 
invariant without computing O (indeed, the application may be to compute O). 
This is often easy to do, at least as far as condition (i) is concerned. As noted 
in §3.5, (i) can typically be guaranteed by constraints involving D — disc(O) and 
the level N of g. Verifying condition (ii) is more difficult, in general, but it can 
be easily addressed in particular cases if we know *f> 9 (X, j(E)) either has a unique 
root in ¥ q (which then must also be a root of H 9 (0) once (i) is satisfied), or that 
all its roots in ¥ q are roots of H 9 (0), or of H 9 (0) for some g with <f>^ = In the 
latter case we may not determine g{E) uniquely, but for the purposes of computing 
a normalized ^-isogeny this does not matter, any choice will work. 

Taking 72 = ^/J as an example, condition (i) holds when (y) 7^ 0, which means 
j(E) is on the surface of its 3-volcano and has either or 2 siblings. This can be 
easily determined using [13] or [28, 4.1]. If we have q = 2 mod 3, the polynomial 
^ 9 (X, j(E)) = X 3 - j(E) has a unique root g(E) in ¥ q and (ii) also holds. 5 

As a second example, consider the Weber f-function, which is related to the j- 
function by &(X, J) = (X 2A - 16) 3 - A 24 J. Now we require (§) ^ and (f ) = 1. 
The latter is equivalent to j(E) being on the surface of its 2-volacano with 2 siblings. 
If we also have q = 11 mod 12, then tyf(X,j(E)) has exactly two roots f(E) and 
—f(E), by [7, Lemma 7.3], and either may be used since 3^ = $7* ■ 

For a more general solution, having verified condition (i), we may simply compute 
instantiated polynomials <j){Y) = ^e(x,Y) for every root x of ^ 9 (X, j{E)) in ¥ q . 
This can be done at essentially no additional cost, and we may then attempt to 
compute a normalized isogeny corresponding to each root x, which we validate by 
computing the dual isogeny (using the normalization factor c = t rather than 1) and 
checking whether the composition corresponds to scalar multiplication by £ using 
randomly generated points in E(¥ q ). The cost of these validations is negligible 
compared to the cost of computing <j)(Y) for even one x. 

As a final remark, we note that in applications such as point counting where 
one is only concerned with the isogeny class of E, in cases where condition (i) 
is not satisfed, one may be able to obtain an isogenous E for which (i) holds by 
simply climbing to the surface of the relevant ^o-volcanoes for the primes to\N (we 
regard N as fixed so £q is small: £ = 2, 3 in the examples above). 



5 There are techniques to handle q = 1 mod 3, see [6] for example, but they assume O is known. 
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4. Applications 

In this section we analyze the use of Algorithms 1 and 2 in two particular appli- 
cations: point counting and computing cndomorphism rings. 

Recall that for an elliptic curve E/¥ q , an odd prime I is called an Elkies prime 
(for E) whenever <j)(Y) = <&g(j(E),Y) has a root in ¥ q . This holds if and only if 
t 2 — 4q is a square mod £, where t = q + 1 — #E(¥ q ). It follows from the Chebotarev 
density theorem that the set of Elkies primes for E has density 1/2. The complexity 
of the Schoof-Elkies-Atkin algorithm [23] for computing #E(F q ) depends critically 
on the number of small Elkies primes, specifically, the least L = L(E) for which 

(10) ]T log* > log(4V?). 

Elkies primes t<L(E) 

On average, one expects L rts \ogq, but even under the GRH the best proven bound 
is L = 0(log 2+£ (7), see Appendix A of [21] by Satoh and Galbraith. This yields a 
complexity bound that is actually slightly worse than Schoof 's original algorithm. 
In practice, the following heuristic assumption is commonly made. 

Heuristic 1. There exists a constant c such that for all sufficiently large q we have 
L(E) < clogq for every elliptic curve E/¥ q . 

Theorem 11. Assume the GRH and Heuristic 1. Let E/¥ q be an elliptic curve 
over a prime field ¥ q and let n = logq. There is a Las Vegas algorithm to compute 
#E(W q ) that runs in 0(n 4 log 2 n Hog n) expected time using 0(n 2 logn) space. 

Proof. Apply the SEA algorithm, using Algorithm 1 to compute 4>{Y) = $>t(j{E),Y) 
(and also 4>x and <fixx), and ignore the Atkin primes, as in [24, Alg. 1], for ex- 
ample. There are 0(n/ log n) primes in the sum (10), and under Heuristic 1, they 
are bounded by L = 0(n). It follows from [24, Table 1] that the expected time to 
process each Elkies prime given <fi is 0(n 3 log 2 n Hog n), which is dominated by the 
time to compute <fi, as is the space. The theorem then follows from Theorem 4. □ 

A common application of the SEA algorithm is to search for random curves of 
prime (or near prime) order, e.g., for use in cryptographic applications. As shown 
in [24], we no longer need Heuristic 1 to do this. Additionally, since we expect to 
count points on many curves (~ log q) , we can take advantage of batching, whereby 
we extend Algorithm 1 to take multiple inputs j(E{) € ¥ qi , . . . , j(Ek) £ ¥ qk and 
produce corresponding outputs for each (the ¥ qi may coincide, but they need not). 
Provided k = 0(\og£), this does not change the time complexity (relative to the 
largest ¥ qi ), since the most time-consuming steps depend only on £, not j(E), and 
the space complexity is increased by at most a factor of k. 6 

Let E at b denote the elliptic curve dchned by y 2 = x 3 + Ax + B, and for any real 
number x > 3, let T(x) denote the set of all triples (q, a, b) with q € [x, 2x] prime, 
a, b e ¥ q , and #-E ,fc prime. The following result strengthens [24, Thm. 3] 

Theorem 12. There is a Las Vegas algorithm that, given x, outputs a uniformly 
distributed triple (q,a,b) <E T(x) and the prime #£^ a> ;,(F g ). Under the GRH, its 
expected running time is 0(n 5 log 2 nllogn) using 0(n\og 2 n) space, where n = logx. 

Proof. We modify the algorithm in [24] to use Algorithm 1, operating on batches 
of O(logn) inputs at a time. One then obtains an 0(n 4 log n Hog n) bound on the 



6 These remarks also apply to Algorithm 2. 
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Task CPU time (3.0 GHz AMD) 

Compute 4>l(Y) with Algorithm 1 32 days 

Compute X q mod <f)£ (using [17]) 995 days 

Compute hi using [14, Alg. 27] 3 days 

Compute Y q and X q mod hi, E using [16] 326 days 

Compute the eigenvalue \i using BSCS 22 days 

1378 days 

Table 1. Breakdown of time spent computing #E(F q ) 



average time to compute #E a ^(F q ) for primes q G [x, 2x], and a space complexity 
of 0(nlog 2 n). The theorem then follows from the proof of [24, Thm. 3]. □ 

A second application of Algorithms 1 and 2 is in the computation of the endomor- 
phism ring of an ordinary elliptic curve. The algorithm in [2] achieves a heuristically 
subexponential running time of L[l/2,\/3/2] using L[l/2, l/\/3] space. Both Al- 
gorithms 1 and 2 improve the space complexity bound to £[1/2, l/\/l2], which is 
of practical significance; space is the limiting factor in these computations. Al- 
gorithm 2 also provides a slight improvement to the time complexity that is not 
visible in the L[a,c] notation but may be practically useful. These remarks also 
apply to the algorithm in [18] for evaluating isogenics of large degree. 

5. Computations 

Using a modified version of the SEA algorithm incorporating Algorithm 1, wc 
counted the number of points on the elliptic curve 

y 2 = x 3 + 2718281828a; + 3141592653, 

modulo the 5011-digit prime q = 16219299585 • 2 16612 - 1. The algorithm ignored 
the Atkin primes and computed the trace of Frobenius t modulo 700 Elkics primes, 
the largest of which was I = 11681; see [27] for details, including the exact value 
of t, which is too large to print here. The computation was distributed over 32 
cores (3.0 GHz AMD Phenom II), and took about 6 weeks. 

For £ = 11681, the size of <p\(Y) = $ f e (f(E),Y) was under 20MB and took 
about two hours to compute (on a single core). As can be seen in Table 1, the 
computation of (j>\ accounted for less than 3% of the total running time, despite 
being the asymptotically dominant step. This is primarily due to the use of the 
Weber f-invariant; with a less advantageous invariant (in the worst case, the j- 
invariant with the optimization of §3.6), the time spent computing 4>i would have 
been comparable to or greater than the time spent on the remaining steps. But in 
any case the computation would still have been quite feasible. 

To demonstrate the scalability of the algorithm, we computed (f>[{Y) for an el- 
liptic curve E/W q , with I = 100019 and q = 2 86243 - 1. Running on 32 cores 
(Algorithms 1 and 2 are both easily parallelized), this computation took less than 
a week. Wc note that the size of the instantiated modular polynomial (and (j>i) 
is almost exactly one gigabyte, whereas the size of is many terabytes, and we 
estimate the size of <J>^ to be 20 or 30 petabytes. 
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